Wigwams GUI
created by: Krzysztof Polanski
contact: k.t.polanski@warwick.ac.uk

1. THE PURPOSE OF THIS GUI

The role of the Wigwams GUI is to aid the researcher in easily mine multiple time course expression data sets for modules using Wigwams. Every function can be called independently, but the GUI makes the algorithm easier to use without affecting its performance.

To run the GUI, navigate Matlab to the GUI folder and type Wigwams_GUI into the console.

2. PREPARING TO RUN WIGWAMS

In order to run Wigwams, you need a dataset structure as created by the dataset creation GUI. Once you have that, import it via the "Import Data" button. This will automatically import a list of seed genes, which is also stored in the dataset creation output, and both "Import Data" and "Import Seeds" will turn green.

Select a job name for your run in the "job name" field. This will be used in naming the files created along the way, as well as the final exports.

Choose a metric from the "metric" drop down. The script natively supports the Pearson and Spearman Correlation Coefficients. If you wish to use a custom metric, prepare a Matlab function that accepts two column vectors as input and returns a single numeric value as output. After selecting "custom metric", import your Matlab function with the "Import Metric" button and select the appropriate ascending/descending sorting of values in decreasing order of correlation from the drop down.

If you wish to annotate your results further (turning probe names into gene names, adding information about gene function etc), import your annotation with the "Import Annotation" button. The annotation needs to be a Matlab cell structure (saved as .mat) with the first column being the gene/probe names found in the dataset structure used as input.

3. RUNNING WIGWAMS

Running each of the stages of the Wigwams algorithm is done via the "RUN" button in the appropriate part of the GUI. While the process is running, the button will be yellow, and feedback may appear in the Matlab console. Once the stage is complete, the button will turn green.

The GUI comes with the parameter values used for the analysis in the article suggested as defaults. The analysis used the Pearson Correlation Coefficient and had 6 time course data sets. The input file for the analysis, Wigwams_model_analysis_shuffled.mat, is included in the folder. A second file, Wigwams_model_analysis.mat, is provided wherein the non-DEG profiles are not shuffled. In order to allow the skipping of the most computationally intensive step of this analysis, the initial module list is also provided. If you wish to use it, skip step 1 and set the job name to "article".

4. STEP 1 - MODULE CREATION

The first step of Wigwams is to mine the time course data sets for all cases of statistically significant dependent co-expression.

THIS IS THE MOST COMPUTATIONALLY INTENSIVE PART OF THE ALGORITHM, you may wish to run it on a quicker machine and transfer the created module_structure file to your computer for further analysis. The module_structure file will be saved in the Wigwams GUI folder.

Parameters:
	-	set sizes - the number of genes most correlated to the seed gene that should be evaluated for overlaps, numerous set sizes can be supplied in attempts to detect dependently co-expressed gene groups of varying sizes
	-	alpha - the significance threshold, a Bonferroni correction is subsequently applied to minimise the FDR
	-	correlation net - if genes in the evaluated set pass this correlation threshold, then the algorithm deems them not correlated enough to the seed gene and doesn't analyse the resulting overlaps

Advanced parameters:
	-	no DEG matrix - tick this if you didn't provide DEG information when creating your input structure
	-	have correlation matrix - tick this if you're running module creation a second time on an input structure you have previously analysed, be sure that the job name is identical to what was used in the prior run
	-	have pre-generated p-values - tick this if you're running module creation for a combination of set sizes and number of time course data sets you have analysed previously, be sure that the job name is identical to what was used in the prior run

5. THE MODULE_STRUCTURE FILES

The modules produced by Wigwams at each stage of the analysis are saved in module_structure files, and can be easily accessed and analysed. The module structure itself is a Matlab cell array, with each row corresponding to a module and the columns containing the following information:
	-	numerically encoded time course data sets spanned (corresponding to dataset.conditions)
	-	the gene that was used as a "seed" to create the module, numerically encoded (corresponding to dataset.genes)
	-	the log10-scale P-value obtained during statistical evaluation of the dependence of the observed co-expression, module_structure files only contain the modules that are deemed statistically significant
	-	the set size used to obtain the module (the number of top correlated genes that were analysed for an overlap in each of the data sets)
	-	the gene membership of the module, numerically encoded (corresponding to dataset.genes)

Columns 2, 3 and 4 lose relevance past the module creation stage.

6. PROCEDURE RUN ORDER

The script is set up in a manner that allows the user to run the remaining three steps in any order they may see fit. This is done via the advanced options for each step. It is not recommended to change the order of merging -> sweeping -> thresholding found in the article.

7. STEP 2 - MERGING

Merging deals with the redundancy among modules spanning the same time course data set subsets, turning a long list of small, redundant modules into a short list of large, non-redundant modules.

A log file is created with information about how many modules were merged into each module present in the list after this procedure.

In order to help select the overlap proportion for merging, a histogram displaying the distribution of overlaps between pairs of modules covering the same data set combination, spanning the interval (0, 1], can be generated with the Histogram button. 

Parameters:
	-	overlap proportion - at least this much of the smaller module needs to be made up by the overlap of the two modules for merging to take place
	-	module means correlation threshold - if the overlap proportion criterion isn't met, but the mean expression profiles of the modules satisfy the correlation threshold for each time course data set the module spans, merging will commence
	-	gene cutoff correlation threshold - genes only featured in the smaller module will get added as members of the merged module if their expression profiles satisfy this correlation threshold for each time course data set the module spans, otherwise the genes will be discarded

Advanced parameters:
	-	delete instead of merging - check this to automatically reject all smaller modules that would have been merged with larger modules, this results in a large amount of gene information loss and is not recommended

8. STEP 3 - SWEEPING

Sweeping is the process of removing redundancy among modules that span different time course data set combinations - if a module features information similar to another module that spans more time course data sets, the module spanning fewer time course data sets will be discarded.

Parameters:
	-	overlap proportion - this much of the overlap between two of the modules needs to make up the module spanning fewer time course data sets for it to be discarded

Advanced parameters:
	-	respect seed affiliation of modules - check this to have sweeping ran individually for each group of modules with the same seed gene, not recommended as seed genes are merely used as reference profiles for module creation and redundant gene groups can be centred around different seed genes.

9. STEP 4 - MODULE LENGTH THRESHOLDING

This procedure discards all modules that are too small, in an attempt to truncate the module list to the modules that will be of more interest to the researcher.

Parameters:
	-	vector of minimal module sizes - the i'th position is the minimal number of genes a module spanning i+1 data sets must feature to avoid being discarded, the vector must contain one fewer numbers than the number of time course data sets analysed by Wigwams

10. EXPORTING THE RESULTS

Once a Wigwams analysis is complete, the obtained modules can be exported for further work and biological data mining.

Options:
	-	format - the format of the files the expression plots will be stored in, .eps stores all the plots in one file whilst the other options will create a new folder with separate image files for each module
	-	BiNGO - check this to obtain a BiNGO-compatible export of the module list
	-	MEME - check this to obtain a MEME-LaB compatible export of the module list
	-	the final drop down menu allows you to select which stage of the analysis to export, and should be set to "cut modules" (after length thresholding, the default) unless the order of the procedures was modified

Upon setting the options as desired, press the "Export Modules" button.